Data-driven speech representations for NMF-based word learning
نویسندگان
چکیده
State-of-the-art solutions in ASR often rely on large amounts of expert prior knowledge, which is undesirable in some applications. In this paper, we consider a NMFbased framework that learns a small vocabulary of words directly from input data, without prior knowledge such as phone sets and dictionaries. In the context of this learning scheme, we compare several spectral representations of speech. Where necessary, we propose changes to their derivation to avoid the usage of prior linguistic knowledge. Also, in a comparison of several acoustic modelling techniques, we determine what model properties are beneficial to the framework’s performance.
منابع مشابه
Discovering hierarchical speech features using convolutional non-negative matrix factorization
Discovering a representation that reflects the structure of a dataset is a first step for many inference and learning methods. This paper aims at finding a hierarchy of localized speech features that can be interpreted as parts. Non-negative matrix factorization (NMF) has been proposed recently for the discovery of parts-based localized additive representations. Here, I propose a variant of thi...
متن کاملDiscovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints
Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Nonnegative Matrix Factorisation (NMF), which is a method for finding parts-based representations of non-negative data. Here, we present an extension to convolutive NMF that includes a sparseness cons...
متن کاملImplementation and test of activation-verification mechanisms
and embedding in ACORNS We describe a bottom-up, activation-based paradigm for continuous speech recognition. Speech is represented by co-occurrence statistics of acoustic events over an analysis window of variable length, leading to a vector representation of high but fixed dimension called “Histogram of Acoustic Cooccurrence” (HAC). During training, recurring acoustic patterns are discovered ...
متن کاملA Deep Non-Negative Matrix Factorization Neural Network
Recently, deep neural network algorithms have emerged as one of the most successful machine learning strategies, obtaining state of the art results for speech recognition, computer vision, and classification of large data sets. Their success is due to advancement in computing power, availability of massive amounts of data and the development of new computational techniques. Some of the drawback...
متن کاملDiscovering Words from Continuous Speech
Modern speech recognizers rely on preprogrammed knowledge of specific languages and extensive examples including annotations. Children however are remarkably welladapted to learning language without such help, learning from examples of the speech itself and from the environment in which they live. Modelling this learning process is a very interesting but also a very complex topic which encompas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012